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Although primary genomic analysis has revealed that severe 
acute respiratory syndrome coronavirus (SARS CoV) is a new 
type of coronavirus, the different protein trees published in 
previous reports have provided no conclusive evidence in¬ 
dicating the phylogenetic position of SARS CoV. To clarify 
the phylogenetic relationship between SARS CoV and other 
coronaviruses, we compiled a large data set composed of 7 
concatenated protein sequences and performed comprehen¬ 
sive analyses, using the maximum-likelihood, Bayesian-in- 
ference, and maximum-parsimony methods. All resulting 
phylogenetic trees displayed an identical topology and sup¬ 
ported the hypothesis that the relationship between SARS CoV 
and group 2 CoVs is monophyletic. Relationships among all 
major groups were well resolved and were supported by all 
statistical analyses. 

In the short amount of time since a novel coronavirus was iden¬ 
tified as being the cause of the ongoing outbreak of severe acute 
respiratory syndrome (SARS) around the world [ 1 ], several SARS 
coronavirus (CoV) isolates have been cloned, and several com¬ 
plete genomic sequences have been determined [2-4]. Prelimi¬ 
nary sequence analyses have indicated that SARS CoV is a new 
type of coronavirus that does not belong to any group of co¬ 
ronaviruses yet characterized [2, 3]. However, the phylogenetic 
position and origin of SARS CoV remain elusive. Most reported 
phylogenetic analyses have been based on either individual pro¬ 
teins [2, 3, 5], short nucleotide sequences [1], or whole-genome 
similarity [6]. Although these analyses have not yielded conflict¬ 
ing results, the phylogenetic relationship between SARS CoV and 
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its relatives remained inconclusive. Whereas most reported phy¬ 
logenetic trees have placed SARS CoV between group 2 CoVs 
and group 3 CoVs, a few trees have indicated that there is a 
relationship between SARS CoV and group 3 CoVs [ 1-3, 5]. 

It is notable that both SARS CoV and infectious bronchitis 
virus (IBV), a group 3 CoV, form long branches in all reported 
phylogenetic trees, because it implies that there might have been 
a problem associated with a long-branch attraction (LBA) ar¬ 
tifact during the tree reconstructions [7]. An LBA artifact might 
be caused by limited taxa, a small number of amino acid or 
nucleotide positions, or highly variable regions of nucleotide 
sequences, any of which could generate misleading phylogenetic 
information during the tree-reconstruction process. To further 
define the phylogenetic relationship between SARS CoV and 
other coronaviruses, we took advantage of recently published 
SARS CoV genome sequences and constructed a single, large 
data set composed of 3364 well-aligned amino acid positions. 
Because the 7 proteins used to construct the data set appeared 
to have different long branches [1-3, 5], the effect of an LBA 
artifact in tree reconstruction should be minimized. The aim 
of the present study was to apply reliable analytical methods 
to the construction of a robust hypothesis about the phylogeny 
of SARS CoV in relation to other coronaviruses. 

Materials and methods. We retrieved the following pro¬ 
tein sequences (accession numbers) from GenBank: SARS CoV 
(NC_004718), human CoV 229E (AF304460), porcine epidemic 
diarrhea virus (/YF353511), transmissible gastroenteritis virus 
(AJ271965), bovine CoV (AF220295), murine hepatitis virus 
(AF201929), and IBV (M95169). The amino acid sequences of 
the 3CL pro , POL, HEL, S, E, M, and N proteins were individually 
aligned by the Clustral X program (version 1.83). Gaps and 
unambiguous alignments were excluded from each alignment. 
After a parsimony-based partition-homogeneity test revealed 
no significant incongruence between trees derived from differ¬ 
ent proteins, 7 protein alignments were concatenated to form 
a large data set of 3364 aa positions for subsequent phylogenetic 
analysis. The parsimony-based partition-homogeneity test was 
performed by the PAUP* program [8]. The homogeneity of 
the structural proteins (S, E, M, and N) and of the enzymatic 
(3CL pro , POL, and HEL) proteins, as well as the homogeneity 
of the 2 protein categories (structural vs. enzymatic), were 
tested using a heuristic search algorithm with 100 replicates; it 
showed no statistically significant incongruence (among en¬ 
zymatic proteins, P = 1; among structural proteins, P = .26; 
between enzymatic proteins and structural proteins, P = 1). 

For phylogenetic analysis, first the Tree-Puzzle program (ver- 
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Figure 1. Unrooted best maximum-likelihood (ML) tree (— InL = 42,346.53), inferred from 3364 amino acid positions of 7 concatenated protein 
sequences obtained from 7 taxa, including severe acute respiratory syndrome coronavirus (SARS CoV) and 6 other coronaviruses. The supporting 
values—by ML quartet-puzzling (MLQP), Bayesian-inference posterior-probability (BIPP), and maximum-parsimony bootstrapping analyses (MPPA)—are 
indicated. BCoV, bovine coronavirus; HCoV-229E, human coronavirus 229E; IBV, infectious bronchitis virus; MHV, murine hepatitis virus; PEDV, porcine 
epidemic diarrhea virus; TGEV, transmissible gastroenteritis virus. 


sion 5.0) was used to generate approximate quartet-likelihood 
trees, with 1000 puzzling steps [9]. Parameters were estimated 
on the basis of the topology of a neighbor-joining tree, and 
the amino acid frequencies were estimated on the basis of the 
concatenated protein data set, by use of a Jones-Taylor-Thorn¬ 
ton (JTT) model of amino acid substitution, a model that in¬ 
cluded the consideration of rate heterogeneity (i.e., the fraction 
of invariance and 4-rate gamma distributions [JTT + F iav + T]). 
Parameters that had been established on the basis of the puz¬ 


zling analysis were then applied to a true maximum-likelihood 
(ML) analysis by the ProML program included in the PHYL1P 
package [10], with the sequence input order randomized and 
with global rearrangements enabled during the tree search. In 
addition, phylogenetic trees were also reconstructed using the 
aforementioned JTT model of amino acid substitution, by a 
Bayesian-inference (BI) method and the MrBayes program 
(version 3.0) [11]. A total of 100,000 generations of searches 
were performed, with 4 chains running simultaneously. Stable 


Table 1. Pairwise distance among coronaviruses (CoVs), corrected by maximum-likelihood model. 



SARS CoV 
(group 4 CoV) 

HCoV-229E 
(group 1 CoV) 

PEDV 

(group 1 CoV) 

TGEV 

(group 1 CoV) 

BCoV 

(group 2 CoV) 

MHV 

(group 2 CoV) 

HCoV-229E (group 1 CoV) 

1.99328 






PEDV (group 1 CoV) 

1.93334 

0.65274 





TGEV (group 1 CoV) 

1.90590 

0.78901 

0.75581 




BCoV (group 2 CoV) 

1.44381 

2.00201 

1.92531 

1.89916 



MHV (group 2 CoV) 

1.41557 

1.97320 

1.87735 

1.81852 

0.27096 


IBV (group 3 CoV) 

2.00554 

2.18897 

2.09103 

2.10307 

2.01146 

1.98161 


NOTE. BCoV, bovine coronavirus; HCoV-229E, human coronavirus 229E; IBV, infectious bronchitis virus; MHV, murine hepatitis virus; PEDV, porcine epidemic 
diarrhea virus; SARS CoV, severe acute respiratory syndrome coronavirus; TGEV, transmissible gastroenteritis virus. 
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ML values were quickly reached before 1000 generations of 
searches, indicating that the Markov chain Monte Carlo analysis 
had been allowed to run for sufficient generations. Posterior 
probabilities at tree nodes were obtained by calculating the 
consensus tree from the best 901 BI trees, by the 50% majority 
ruling method. The bootstrapping test was performed, with 
1000 replicates, using the maximum-parsimony (MP) method, 
by the PAUP* program [8]; the input order of each search was 
randomized, in 100 replicates. A full heuristic algorithm was 
used to search for the best trees, and tree-bisection reconnection 
was applied for branch swapping. The statistical significance 
for the difference between each resulting best tree and all al¬ 
ternative trees were tested by both the Kishino-Hasegawa (KH) 
method [12] and the Shimodaira-Hasegawa (SH) method [13]. 

Results. The data set was composed of 7 concatenated pro¬ 
tein sequences (i.e., 3CL pro , POL, HEL, S, E, M, and N) obtained 
from 7 coronavirus isolates, and a partition-homogeneity test 
[14] revealed no significant incongruence between phylogenetic 
trees derived from different proteins. The best trees were in¬ 
ferred from the data set by the ML, BI, and MP methods [8- 
11], all of which yielded the same tree topology and supported 
the hypothesis that the relationship between SARS CoV and 
group 2 CoVs is monophyletic (figure 1). The statistical sup¬ 
porting values at all nodes were 100%, by ML quartet-puzzling, 
BI posterior-probability, and MP bootstrapping analyses. Pair¬ 
wise comparison of protein distances, corrected by the ML 
method, also showed that the intergroup distance between 
SARS CoV and group 2 CoVs was the shortest, compared with 
those between SARS CoV and other coronaviruses (table 1). 
The SARS CoV + group 2 CoVs clade was subsequently joined 
by IBV, a group 3 CoV. The separation of SARS CoV and IBV 
was more evident in the trees resulting from the present study 
than in previously reported protein trees, in which SARS CoV 
and IBV were either minimally separated by very short branches 
or artificially joined at deep branches [1-3]. The hypothesis 
that the relationship between SARS CoV and group 2 CoVs is 
monophyletic was fully supported by the KH test, in which the 
ML values of all the other 944 possible trees were shown to be 
significantly worse than that of the present best tree. When a 
more conservative SH test was employed, only 19 suboptimal 
trees did not show significant differences in their ML values, 
compared with the tree shown in figure 1. Among these 19 trees, 
13 supported the hypothesis that the relationship between SARS 
CoV and group 2 CoVs is monophyletic, and only 6 either placed 
SARS CoV at the base of group 2 CoVs/group 3 CoVs or iden¬ 
tified it as a sister to group 3 CoVs. Therefore, the SH test did 
not reject the best tree but, rather, implied that there are strong 
links between SARS CoV, group 2 CoVs, and group 3 CoVs. In 
addition, the hypothesis that the relationship between SARS CoV 


and group 2 CoVs is monophyletic is supported by a recently 
reported phylogenetic analysis using the replicase gene [15]. 

Discussion. Despite the evidence supporting a monophy¬ 
letic relationship between SARS CoV and group 2 CoVs, the 
data currently available still support the preliminary conclusion 
that SARS CoV might be a new type of coronavirus (i.e., group 
4). The problem of the origin of SARS CoV cannot be resolved 
here—it may require the identification and sequencing of ad¬ 
ditional, closely related coronaviruses from humans and/or an¬ 
imals. Nonetheless, the establishment of a solid phylogenetic 
relationship between SARS CoV and other coronaviruses may 
provide us with valuable information to use in the development 
of vaccines and therapeutics and may, in the near future, help 
shed light on the true origin of SARS CoV. 
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